Skip to content

HYPERFLEET-752 | ci: Improve E2E CI Test deployment logic#51

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift-hyperfleet:mainfrom
yingzhanredhat:hyperfleet-752
Mar 23, 2026
Merged

HYPERFLEET-752 | ci: Improve E2E CI Test deployment logic#51
openshift-merge-bot[bot] merged 1 commit intoopenshift-hyperfleet:mainfrom
yingzhanredhat:hyperfleet-752

Conversation

@yingzhanredhat
Copy link
Contributor

@yingzhanredhat yingzhanredhat commented Mar 20, 2026

Summary by CodeRabbit

  • New Features

    • CLI option to customize the debug log directory.
    • Helper to capture and archive comprehensive debug artifacts on failure.
  • Improvements

    • Automatic capture and archival of debug logs on deployment or health-check failures.
    • Failed deployments now attempt timed cleanup to remove partial releases.
    • Releases now include discovery labels and randomized suffixes to avoid name collisions.
  • Bug Fixes

    • Uninstall now finds and removes all matching releases, reducing leftovers.

@yingzhanredhat yingzhanredhat requested a review from yasun1 March 20, 2026 02:20
@openshift-ci openshift-ci bot requested review from crizzo71 and rafabene March 20, 2026 02:20
@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2026

Walkthrough

Adds centralized debug-log capture and automated cleanup to CLM deploy scripts. Introduces DEBUG_LOG_DIR (default: ${PROJECT_ROOT}/.debug-work) and CLI flag --debug-log-dir. Adds capture_debug_logs(namespace, selector, component_name, output_dir) which collects pod logs, descriptions, events, workloads and services into a timestamped directory. Installer flows (adapter, api, sentinel) call capture_debug_logs on Helm failures or health-check failures and then attempt helm uninstall <release> -n <ns> --wait --timeout 5m. Adapter installs now append an 8‑char random suffix to release names, add Helm labels (adapter-resource-type, adapter-name), and uninstalls query releases by those labels (with a prefix fallback).

Sequence Diagram

sequenceDiagram
    actor User
    participant DeployScript as Deploy Script
    participant Helm
    participant Kubernetes
    participant DebugLogs as Debug Log Capture

    User->>DeployScript: Run install (optional --debug-log-dir)
    DeployScript->>Helm: helm upgrade --install <release>-<rand>
    
    alt Helm install succeeds
        Helm-->>DeployScript: Release created
        DeployScript->>Kubernetes: Run health check probe
        alt Health check passes
            Kubernetes-->>DeployScript: Healthy
            DeployScript-->>User: Installation complete
        else Health check fails
            Kubernetes-->>DeployScript: Unhealthy
            DeployScript->>DebugLogs: capture_debug_logs(namespace, selector, release, DEBUG_LOG_DIR)
            DebugLogs->>Kubernetes: kubectl logs/describe/events/workloads/services...
            Kubernetes-->>DebugLogs: Diagnostics collected
            DebugLogs-->>DeployScript: Logs saved
            DeployScript->>Helm: helm uninstall <release>-<rand> -n <ns> --wait --timeout 5m
            Helm-->>DeployScript: Uninstall result
            DeployScript-->>User: Installation failed
        end
    else Helm install fails
        Helm-->>DeployScript: Install failed
        DeployScript->>Helm: helm list -n <ns> --selector adapter-resource-type=...,adapter-name=... -q
        alt Matching releases found
            Helm-->>DeployScript: Releases listed
            DeployScript->>DebugLogs: capture_debug_logs(namespace, selector, release, DEBUG_LOG_DIR)
            DebugLogs->>Kubernetes: kubectl logs/describe/events/workloads/services...
            DebugLogs-->>DeployScript: Logs saved
            DeployScript->>Helm: helm uninstall <matching-release> -n <ns> --wait --timeout 5m
            Helm-->>DeployScript: Uninstall result(s)
        else No labeled releases
            DeployScript->>Helm: fallback: list by release-name prefix
            Helm-->>DeployScript: Releases listed
            DeployScript->>Helm: helm uninstall <matching-release> -n <ns> --wait --timeout 5m
        end
        DeployScript-->>User: Installation failed
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change: improving E2E CI test deployment logic with reference to the ticket. However, it is somewhat generic and doesn't specifically convey that the changes involve debug logging capture, random Helm release suffixes, and deployment cleanup improvements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
deploy-scripts/deploy-clm.sh (1)

445-459: Debug log preservation logic may fail if DEBUG_LOG_DIR is outside WORK_DIR.

If a user specifies --debug-log-dir /var/log/hyperfleet (a path outside WORK_DIR), this preservation logic would unnecessarily move logs to a temp directory and back. Worse, if the custom path doesn't exist initially, the check passes but subsequent operations might fail.

Consider adding a check to only preserve when DEBUG_LOG_DIR is a subdirectory of WORK_DIR:

♻️ Proposed fix
     # Clean up work directory (but preserve debug logs)
     if [[ "${DRY_RUN}" == "false" && "${VERBOSE}" == "false" ]]; then
         log_verbose "Cleaning up work directory"
         # Preserve debug logs if they exist
-        if [[ -d "${DEBUG_LOG_DIR}" ]]; then
+        if [[ -d "${DEBUG_LOG_DIR}" && "${DEBUG_LOG_DIR}" == "${WORK_DIR}"/* ]]; then
             local temp_debug_dir
             temp_debug_dir=$(mktemp -d)
             mv "${DEBUG_LOG_DIR}" "${temp_debug_dir}/debug-logs" 2>/dev/null || true
             rm -rf "${WORK_DIR}"
             mkdir -p "${WORK_DIR}"
             mv "${temp_debug_dir}/debug-logs" "${DEBUG_LOG_DIR}" 2>/dev/null || true
             rm -rf "${temp_debug_dir}"
         else
             rm -rf "${WORK_DIR}"
         fi
     fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-scripts/deploy-clm.sh` around lines 445 - 459, The preservation logic
for DEBUG_LOG_DIR may move or touch paths outside WORK_DIR; update the cleanup
block to first resolve paths (e.g., realpath) and only perform the temp-preserve
dance when DEBUG_LOG_DIR exists and its resolved path is under WORK_DIR's
resolved path (starts-with check); if DEBUG_LOG_DIR is outside WORK_DIR or
doesn't exist, skip moving it and simply rm -rf "${WORK_DIR}" as before;
reference the DEBUG_LOG_DIR and WORK_DIR variables and the existing
temp_debug_dir usage when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@deploy-scripts/deploy-clm.sh`:
- Around line 445-459: The preservation logic for DEBUG_LOG_DIR may move or
touch paths outside WORK_DIR; update the cleanup block to first resolve paths
(e.g., realpath) and only perform the temp-preserve dance when DEBUG_LOG_DIR
exists and its resolved path is under WORK_DIR's resolved path (starts-with
check); if DEBUG_LOG_DIR is outside WORK_DIR or doesn't exist, skip moving it
and simply rm -rf "${WORK_DIR}" as before; reference the DEBUG_LOG_DIR and
WORK_DIR variables and the existing temp_debug_dir usage when making this
change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: de36bfd4-cfa9-4f04-a8bc-5b0f18af6067

📥 Commits

Reviewing files that changed from the base of the PR and between f73bb04 and 1ae2387.

📒 Files selected for processing (5)
  • deploy-scripts/deploy-clm.sh
  • deploy-scripts/lib/adapter.sh
  • deploy-scripts/lib/api.sh
  • deploy-scripts/lib/common.sh
  • deploy-scripts/lib/sentinel.sh

@yingzhanredhat
Copy link
Contributor Author

/test lint

@yasun1
Copy link
Contributor

yasun1 commented Mar 20, 2026

Code review

Found 1 issues:

  1. Race condition in debug log directory preservation - The cleanup logic uses a complex move-delete-recreate-move sequence that can silently lose debug logs if any step fails. Both mv commands use || true which suppresses errors, and the temp directory gets deleted regardless of whether the move-back succeeded.

if [[ "${DRY_RUN}" == "false" && "${VERBOSE}" == "false" ]]; then
log_verbose "Cleaning up work directory"
# Preserve debug logs if they exist
if [[ -d "${DEBUG_LOG_DIR}" ]]; then
local temp_debug_dir
temp_debug_dir=$(mktemp -d)
mv "${DEBUG_LOG_DIR}" "${temp_debug_dir}/debug-logs" 2>/dev/null || true
rm -rf "${WORK_DIR}"
mkdir -p "${WORK_DIR}"
mv "${temp_debug_dir}/debug-logs" "${DEBUG_LOG_DIR}" 2>/dev/null || true
rm -rf "${temp_debug_dir}"
else
rm -rf "${WORK_DIR}"
fi
fi

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy-scripts/lib/common.sh`:
- Around line 195-201: The debug-log capture currently allows empty WORK_DIR and
ignores failures from directory creation and capture commands; update the logic
around the output_dir and capture steps so failures are propagated: validate
that WORK_DIR (used when computing output_dir) is non-empty before defaulting,
check the return code of mkdir -p for "${output_dir}" and exit non-zero with an
error log via log_section/processLogger on failure, and similarly add explicit
error checks for the capture commands referenced around lines 259-264 so any
failed capture causes the function to log the error and return a non-zero exit
code instead of silently succeeding.
- Around line 202-205: The current timestamp used to build log_prefix (variables
timestamp and log_prefix, referencing output_dir and component_name) has only
second-level granularity and can collide; update timestamp generation to be
collision-resistant by including higher-resolution time and a unique process
identifier (e.g., use date +"%Y%m%d-%H%M%S-%N" and append $$ or similar) or use
a safe unique generator (mktemp/uuid) and then rebuild
log_prefix="${output_dir}/${component_name}-${timestamp}" so concurrent runs
cannot overwrite each other.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3d925e03-58fb-4db5-bc75-505579580ac6

📥 Commits

Reviewing files that changed from the base of the PR and between 1ae2387 and 4558a57.

📒 Files selected for processing (5)
  • deploy-scripts/deploy-clm.sh
  • deploy-scripts/lib/adapter.sh
  • deploy-scripts/lib/api.sh
  • deploy-scripts/lib/common.sh
  • deploy-scripts/lib/sentinel.sh
🚧 Files skipped from review as they are similar to previous changes (4)
  • deploy-scripts/deploy-clm.sh
  • deploy-scripts/lib/api.sh
  • deploy-scripts/lib/sentinel.sh
  • deploy-scripts/lib/adapter.sh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy-scripts/deploy-clm.sh`:
- Around line 101-103: The help text for the --debug-log-dir flag is
inconsistent with the actual default
(DEBUG_LOG_DIR="${DEBUG_LOG_DIR:-${PROJECT_ROOT}/.debug-work}"); update the
documented default in the option/usage text for --debug-log-dir to
"${PROJECT_ROOT}/.debug-work" so the help matches the implementation (or
alternatively change the DEBUG_LOG_DIR assignment to use ${WORK_DIR}/debug-logs
if you prefer the documented path); ensure you modify the help string that
references ${WORK_DIR}/debug-logs and keep the flag name --debug-log-dir and
variable DEBUG_LOG_DIR in sync.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 446ca419-15e2-44b0-98da-c957b9c3b2ba

📥 Commits

Reviewing files that changed from the base of the PR and between 4558a57 and bec5898.

📒 Files selected for processing (5)
  • deploy-scripts/deploy-clm.sh
  • deploy-scripts/lib/adapter.sh
  • deploy-scripts/lib/api.sh
  • deploy-scripts/lib/common.sh
  • deploy-scripts/lib/sentinel.sh
🚧 Files skipped from review as they are similar to previous changes (3)
  • deploy-scripts/lib/api.sh
  • deploy-scripts/lib/common.sh
  • deploy-scripts/lib/sentinel.sh

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
deploy-scripts/lib/adapter.sh (1)

110-112: Inefficient /dev/urandom read pattern.

head /dev/urandom reads until a newline character, but /dev/urandom rarely produces newlines, causing it to buffer a large chunk (up to 64KB) before piping to tr. This wastes entropy and CPU cycles.

♻️ Proposed fix
   # Generate random suffix to prevent namespace conflicts
   local random_suffix
-  random_suffix=$(head /dev/urandom | LC_ALL=C tr -dc 'a-z0-9' | head -c 8)
+  random_suffix=$(LC_ALL=C tr -dc 'a-z0-9' < /dev/urandom | head -c 8)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy-scripts/lib/adapter.sh` around lines 110 - 112, Replace the unbounded
"head /dev/urandom" pattern used to generate random_suffix with a fixed-byte
read from /dev/urandom so you only consume the entropy you need; update the
random_suffix assignment (the random_suffix variable initialization) to read a
small, fixed number of bytes (e.g., one block) from /dev/urandom and then filter
to [a-z0-9] and cut to 8 characters, rather than piping an open-ended head, to
avoid buffering a large chunk and wasting CPU/entropy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@deploy-scripts/lib/adapter.sh`:
- Around line 110-112: Replace the unbounded "head /dev/urandom" pattern used to
generate random_suffix with a fixed-byte read from /dev/urandom so you only
consume the entropy you need; update the random_suffix assignment (the
random_suffix variable initialization) to read a small, fixed number of bytes
(e.g., one block) from /dev/urandom and then filter to [a-z0-9] and cut to 8
characters, rather than piping an open-ended head, to avoid buffering a large
chunk and wasting CPU/entropy.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7f8e5c39-9c43-4c14-b084-787bdb685d6d

📥 Commits

Reviewing files that changed from the base of the PR and between bec5898 and e17776f.

📒 Files selected for processing (5)
  • deploy-scripts/deploy-clm.sh
  • deploy-scripts/lib/adapter.sh
  • deploy-scripts/lib/api.sh
  • deploy-scripts/lib/common.sh
  • deploy-scripts/lib/sentinel.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • deploy-scripts/lib/sentinel.sh

@rafabene
Copy link
Contributor

/lgtm

@openshift-ci
Copy link

openshift-ci bot commented Mar 23, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rafabene

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 00fc4eb into openshift-hyperfleet:main Mar 23, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants